2,993 research outputs found

    Normalization of Disease Mentions with Convolutional Neural Networks

    Get PDF
    Normalization of disease mentions has an important role in biomedical natural language processing (BioNLP) applications, such as the construction of biomedical databases. Various disease mention normalization systems have been developed, though state-of-the-art systems either rely on candidate concept generation, or do not generalize to new concepts not seen during training. This thesis explores the possibility of building a disease mention normalization system that both generalizes to unseen concepts and does not rely on candidate generation. To this end, it is hypothesized that modern neural networks are sophisticated enough to solve this problem. This hypothesis is tested by building a normalization system using deep learning approaches, and evaluating the accuracy of this system on the NCBI disease corpus. The system leverages semantic information in the biomedical literature by using continuous vector space representations for strings of disease mentions and concepts. A neural encoder is trained to encode vector representations of strings of disease mentions and concepts. This encoder theoretically enables the model to generalize to unseen concepts during training. The encoded strings are used to compare the similarity between concepts and a given mention. Viewing normalization as a ranking problem, the concept with the highest similarity estimated is selected as the predicted concept for the mention. For the development of the system, synthetic data is used for pre-training to facilitate the learning of the model. In addition, various architectures are explored. While the model succeeds in prediction without candidate concept generation, its performance is not comparable to those of the state-of-the-art systems. Normalization of disease mentions without candidate generation while including the possibility for the system to generalize to unseen concepts is not trivial. Further efforts can be focused on, for example, testing more neural architectures, and the use of more sophisticated word representations

    A preliminary study to investigate the expressive syntactic ability of normal speakers

    Get PDF
    Grammatical problem was one of the most prominent characteristics of speech in persons with aphasia (Gordon, 2006) and progressive aphasic syndromes (Knibb, Woollams, Hodges, & Patterson, 2009). Measures used to investigate the grammatical deficits on the discourse performance of persons with aphasia could be roughly classified into to two categories, one related to the level of lexicon, the other concerned with the level of syntax. Most of the measures belonged to the former category used words to analysis the variation on the speech performance, such as correct information units (CIUs; Nicholas & Brookshire, 1993), type token ratio (TTR); while the measures applied in studies related to the syntactic ability was more varied. Such as proportion of sentences well formed, auxiliary scores, proportion of verbs inflected, proportion of obligatory determiners in quantitative production analysis (QPA) (Gordon, 2006), and the mean length of the syntactic units, the proportion of syntactic units suggested by Lind, Kristoffersen, Moen, and Simonsen (2009). However, the measures used to depict the syntactic ability of a person was separated, could not provide a profile to reveal a pattern of syntactic ability in a consecutive picture. In order to develop a syntactic scoring system that can capture the changes in the characteristics of narrative speech, we adopted the concept from studies in child language development (Hsu, 2003) and widen the category to encompass the imperfect parts in natural speech. The applicability of this scoring system was firstly tested by the normal population in order to examine if the range of the scope is suitable for reflecting the expressive syntactic ability of a normal speaker

    MAT: A Multi-strength Adversarial Training Method to Mitigate Adversarial Attacks

    Full text link
    Some recent works revealed that deep neural networks (DNNs) are vulnerable to so-called adversarial attacks where input examples are intentionally perturbed to fool DNNs. In this work, we revisit the DNN training process that includes adversarial examples into the training dataset so as to improve DNN's resilience to adversarial attacks, namely, adversarial training. Our experiments show that different adversarial strengths, i.e., perturbation levels of adversarial examples, have different working zones to resist the attack. Based on the observation, we propose a multi-strength adversarial training method (MAT) that combines the adversarial training examples with different adversarial strengths to defend adversarial attacks. Two training structures - mixed MAT and parallel MAT - are developed to facilitate the tradeoffs between training time and memory occupation. Our results show that MAT can substantially minimize the accuracy degradation of deep learning systems to adversarial attacks on MNIST, CIFAR-10, CIFAR-100, and SVHN.Comment: 6 pages, 4 figures, 2 table

    Deep vertebrate roots for mammalian krab zinc-finger transcription factors

    Get PDF
    KRAB-associated C2H2 zinc-finger (KRAB-ZNF) proteins are the products of a rapidly evolving gene family that traces back to early tetrapods, but which has expanded dramatically to generate an unprecedented level of species-specific diversity. While most attention has been focused on the more recently evolved primate KRAB-ZNF genes, the vertebrate roots of the KRAB-ZNF families have remained mysterious. We recently mined ZNF loci from seven sequenced genomes (opossum, chicken, zebra finch, lizard, frog, mouse, and human genome) and found hundreds of KRAB-ZNF proteins in every species we examined, but only three human genes were found with clear orthologs in non-mammalian vertebrates. These three genes, ZNF777, ZNF282, and ZNF783, are members of an ancient familial cluster and encode proteins with similar domain structures. These three genes, members of an ancient familial cluster, encode a noncanonical KRAB domain that is similar to an ancient domain which is prevalent in non-mammalian species. In contrast to the mammalian KRAB, which is thought to function as a potent repressor, this ancient domain serves as a transcriptional activator. Our evolutionary analysis confirmed the ancient provenance of this activating KRAB and revealed the independent expansion of KRAB-ZNFs in every vertebrate lineage. This finding led us to ask the question: what are the functions of these ancient family members and why, of such a large and diverse family group, were these three genes conserved so fastidiously over hundreds of millions of years? In chapter 2, I report the regulatory function of ZNF777, combining chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-seq) with siRNA knockdown experiments to determine genome-wide binding sites, a distinct binding motif, and predicted targets for the protein in human BeWo choriocarcinoma cells. Genes neighboring ZNF777 binding sites can be either up- or down- regulated, suggesting a complex regulatory role. Our studies revealed that some of this complexity is due to the generation of HUB-containing and HUB-minus isoforms, which are predicted to have different regulatory activities. Based on these experiments, we hypothesize that ZNF777 regulates pathways best known for their roles in neurogenesis and axon pathfinding, but also recently shown to play critical roles in placental development. Since ZNF777 is also expressed in embryonic brain, we sought to further investigate the functional role of this ancient gene in neuron development. In chapter 3, I show that mouse Zfp777 is expressed in neuronal stem cells (NSC) cultured from early mouse embryos, with a pattern that changes over the course of neuron differentiation in vitro. Using the NSC platform, I characterized the binding landscape of Zfp777 in undifferentiated NSC. To circumvent the roadblock posed by the lack of a ChIP-grade antibody for the mouse protein, I exploited the CRISPR-Cas9 technique to tag the endogenous Zfp777 protein with FLAG epitopes. Our results revealed a novel Zfp777 binding motif that bears significant similarity to a motif predicted in in vitro studies, and found that Zfp777 binds to promoters of genes encoding transcription factors, Wnt and TGF-beta pathways components, and proteins related to neuron development and axon guidance. Since these same functions were also found to be regulated by ZNF777 in BeWo cells, these results suggested that the mouse and human Zfp777 and ZNF777 proteins regulating similar genes and pathways, most classically associated with axon guidance, in diverse tissues

    A Case Study for Exploring Dental Patients’ Preferred Roles in Taiwan

    Get PDF
    The purpose of this study was to explore the dental patients’ preferred roles in Taiwan. A convenience sample of 66 patients, 26 recruited from one dental clinic, and 40 from one medical center, were interviewed and their preferences for participation in treatment decision making were established using a measurement tool designed to elicit decision-making preferences. Patients’ preferences for participation in treatment decision making were established using Control Preference Scale (CPS) tool. In addition, Unfolding theory provided a means of analyzing the data so that the degree of control preferred by each patient could be established. This study found that nearly 70% clinic patients perceived passive role in treatment decision making whereas 50% patients in medical centre. Further, the collaborative role was most commonly preferred, but an active role was more commonly perceived in clinics than in medical centre. Finally, the implications of the results for patient participation are discussed

    A Global Decision Support System for Garment Manufacturing by Using Genetic Algorithm

    Get PDF
    In the recent years, each industry has to face the situation of making the decisions from global markets, especially the industries within lower technicality. These industries earn money hardly in the perfectly competitive markets. Sometimes, decision makers have to decide how to allot orders in the different factories because of distinctive requests from individual consumer. It is necessary to find a way to help managers with making a decision and allotting orders effectively. The purpose of this study tries to develop a decision support system (DSS) to help the managers and decision makers of a real garment industry In Taiwan to decide order allocation, and we used genetic algorithm (GA) for analysis tools and results would be showed by visual graphs to assist managers in decision making. By decision support systems, managers and decision makers might decide order allocation quickly and save the costs. Finally, the decision support system results in a visional frame within lowest cost, and managers decide order allocation with effectiveness by graphs. With this information, decision makers might make different decisions in unlike situations for dissimilar goals. The system had developed to be used easily and suitable to the garment industries and other similar manufacturing industries

    Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022)

    Get PDF
    Automatic grouping of textual answers has the potential of allowing batch grading, but is challenging because the answers, especially longer essays, have many claims. To explore the feasibility of grouping together answers based on their semantic meaning, this paper investigates the grouping of short textual answers, proxies of single claims. This is approached as a paraphrase identification task, where neural and non-neural sentence embeddings and a paraphrase identification model are tested. These methods are evaluated on a dataset consisting of over 4000 short textual answers from various disciplines. The results map out the suitable question types for the paraphrase identification model and those for the neural and non-neural methods.</p

    Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022)

    Get PDF
    Previous work concerning measurement of second language learners has tended to focus on the knowledge of small numbers of words, often geared towards measuring vocabulary size. This paper presents a “tall” dataset containing information about a few learners’ knowledge of many words, suitable for evaluating Vocabulary Inventory Prediction (VIP) techniques, including those based on Computerised Adaptive Testing (CAT). In comparison to previous comparable datasets, the learners are from varied backgrounds, so as to reduce the risk of overfitting when used for machine learning based VIP. The dataset contains both a self-rating test and a translation test, used to derive a measure of reliability for learner responses. The dataset creation process is documented, and the relationship between variables concerning the participants, such as their completion time, their language ability level, and the triangulated reliability of their self-assessment responses, are analysed. The word list is constructed by taking into account the extensive derivation morphology of Finnish, and infrequent words are included in order to account for explanatory variables beyond word frequency.</p
    • …
    corecore